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Abstract 

Energy minimization has been an intensely studied core problem in computer vision. With growing 
image sizes (2D and 3D), it is now highly desirable to run energy minimization algorithms in parallel. 
But many existing algorithms, in particular, some efficient combinatorial algorithms, are difficult to par¬ 
allelize. By exploiting results from convex and submodular theory, we reformulate the quadratic energy 
minimization problem as a total variation denoising problem, which, when viewed geometrically, enables 
the use of projection and reflection based convex methods. The resulting nrin-cut algorithm (and code) is 
conceptually very simple, and solves a sequence of TV denoising problems. We perform an extensive em¬ 
pirical evaluation comparing state-of-the-art combinatorial algorithms and convex optimization techniques. 
On small problems the iterative convex methods match the combinatorial max-flow algorithms, while on 
larger problems they offer other flexibility and important gains: (a) their memory footprint is small; (b) 
their straightforward parallelizability fits multi-core platforms; (c) they can easily be warm-started; and 
(d) they quickly reach approximately good solutions, thereby enabling faster “inexact” solutions. A key 
consequence of our approach based on submodularity and convexity is that it is allows to combine any ar¬ 
bitrary combinatorial or convex methods as subroutines, which allows one to obtain hybrid combinatorial 
and convex optimization algorithms that benefit front the strengths of both. 

* Based on work originally submitted on 14 November, 2014. 
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1 Introduction 


Energy minimization has become a key element in many low- to mid-level tasks in computer vision, such as 
segmentation or stereo correspondence (see O for a survey). For many, frequently occurring such minimiza¬ 
tion problems, graph cut techniques have emerged as generic, very efficient tools that provide global optima 
for submodular quadratic penalties, and extend to several higher-order and non-submodular potentials too. 
However, when applying widely used graph-cut code (e.g., CD) to huge problems in 3D or video, running 
time and memory usage become problematic. Ideally, we would wish to have algorithmic flexibility to de¬ 
compose the problem into (almost) arbitrary subproblems that can be solved in parallel, and that adapt to new 
architectures such as GPU clusters. These latter needs can be met through convex optimization. 

In this paper, we explore methods that allow combining convex and combinatorial optimization, and thereby 
offer a way to parallelize recent successful combinatorial methods mmm. Our algorithms can run on 
large datasets while using only limited memory, and are flexible enough to be ported to different hardware 
architectures. 

We begin by rewriting the energy minimization problem as a convex optimization problem. More precisely, 
we consider the equivalent reformulation of the min-cut problem as the minimization of a unary term w 1 x 
plus the total variation f{x) associated to a graph, over the vertices of the hypercube x £ {0,1}". The 
straightforward convex relaxation replaces the set of vertices {0,1}" of the hypercube with the full hypercube 
[0, l] n . This relaxation is tight, a fact that has been exploited widely. Moreover, typical graph structures that 
occur in low-level vision problems naturally decompose as a superposition of chains, suggesting the use of 
iterative convex optimization methods lf32l [391 . However, the lack of smoothness (in the primal and dual 
problems) poses some difficulties in solving the problem efficiently. 

Hence, we instead consider a reformulation through total variation denoising outlined by l20l. [23 1 for general 
submodular functions, which we use for cuts in this paper. The gist of this approach is to replace the non¬ 
smooth relaxation by the total variation (TV) denoising problem min^gR.. ^\\x — w || 2 + f(x), from which 
one may obtain an optimal solution of the energy minimization problem by thresholding. The benefit of this 
formulation is its smooth dual problem, which has a natural geometric interpretation: it reduces to computing 
the distance between two convex sets. Moreover, via the equivalence between energy minimization and 
projections we obtain fast projection subroutines, which in turn enable the use of classical, popular projection 
methods l23l . 

While we focus on cuts in this paper, our framework extends to all higher-order submodular potentials, non- 
submodular quadratic potentials, and to multi-label problems (see Section [4~4| for more details). We make the 
following contributions: 

- We study algorithms that allow to easily combine convex and combinatorial optimization (e.g., graph 
cut, total variation, algorithms for higher order submodular potentials), and to parallelize existing fast 
energy minimization methods for submodular potentials. In fact, this approach applies to arbitrary 
sums of submodular functions, each defined on all or a subset of the variables. 

- As a case study, we specialize the optimization framework of ll23l to the submodular energy functions 
most commonly used in computer vision, such as quadratic functions with a 2D or 3D grid structure 
with various connectivities (Section [3.1| >. The resulting min-cut algorithm (and code) is conceptually 
very simple and may thus be easily used and modified. 

- We perform an extensive empirical evaluation of serial and parallel, combinatorial and convex min-cut 
codes on benchmarks GUO. We observe that iterative techniques based on convex optimization 
match combinatorial max-flow algorithms on small problems, and on larger problems they offer com¬ 
plementary flexibility: (a) they have a reduced memory footprint and can tackle many problems where 
traditional methods run out of memory, (b) they parallelize easily on multi-core platforms, (c) they 
can be efficiently warm-started, (d) they quickly reach approximately good solutions, thereby enabling 
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faster “inexact” solutions. These observations suggest to marry convex and combinatorial methods to 
attain “the best of both worlds”. 


2 Convex optimization for graph cuts 


In this paper, we focus on energy minimization problems with pairwise potentials, 

E Tl « Tl 

wiXi + y ipij(xi,Xj ), 

i—\ ^ i 1 


( 1 ) 


where the variables Xi take values in a set of discrete labels. For simplicity, we here focus on binary labels, 
Xi £ {0,1}. The algorithms we discuss may be generalized to multiple labels via move-making strategies 
031- Moreover, we assume the pairwise potentials to be submodular, i.e., ipij{ 0,1) + fij (1 ,0) > ifiji 0, 0) + 
ipij( 1,1). (One may extend to non-submodular potentials via roof duality lf38l ). It is well known that all such 
submodular energy functions may be written as graph cut functions with nonnegative “edge weights” a,; ? -, up 
to a constant l35l : 


E(x) 
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( 2 ) 


This function consists of two parts: (1) a sum of unary potentials — w i x i = — w T x\ and (2) the sum 
of pairwise potentials, which is equivalent to a weighted graph cut between the set of indices i in {1,..., n} 

for which Xi = 1, and its complement. For x £ R n , this sum is the total variation function f(x) = 
j=i a ij\xi~Xj\. Note that this is a case of anisotropic weighted total variation. Since the weights Oy are 
non-negative, the function / is convex. We refer to the graph cut problem as the discrete problem : 


min fix) — w T X. (D) 

xe{o,i}" 


2.1 Convex reformulation 

We obtain a relaxation to the combinatorial problem in Eq. (Jd]> by replacing {0,1}" by its convex hull [0, 1 ]: 

min *e[o,i]" f( x ) - w T x. (C) 

We refer to Eq. ([C]) as the continuous problem. This relaxation is exact : since the continuous convex problem 
in Q is a minimization problem over a larger set than the discrete problem, its minimal value has to be lower 
than ||D). However, as a consequence of properties of the total variation and its relation to submodular graph 
cut functions (see, e.g., |2l Sec. 3.3] or fl20lfl2i for a proof dedicated to cut functions), the two optimal values 
are equal and a solution to 0 may be obtained from a solution x £ [0, l] n of 0 by looking at all “level 
sets” of x , that is by rounding the values of x to zero or one by thresholding at a given level in [0,1] (there 
are at most n possible thresholds, which can be obtained by first sorting the components of 

2.2 Convex duality 

The total variation / is convex and absolutely homogeneous, that is, for any x £ R” and A € 1, /(Aa;) = 
|A| fix). For all such functions, there exists a centrally symmetric convex body K C R n such that for all 

x £ M n El §13], 

f(x) = max ye if y T x. 

Note that when f(x) happens to be equal to zero only for x = 0, then f is a norm and the set I\ is simply the 
unit ball of the dual norm. 
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Since / is piecewise affine, the set K is a polytope (i.e., the convex hull of finitely many points). The set K 
may be described precisely for general submodular functions and is usually referred to as the base polytope 
of the submodular function QS [231121. 

Using Fenchel duality, we arrive at the following dual problem to |0 [ 16li2l: 


min fix) — w T x 

XG[0,1]" 


min ma xy'x — w T x 
®e[o,i]" v&k 

max min x T (y — w) 

yGK a:e[0,l]" 


n 

max \ min{t/j — Wi , 0}. 
1=1 


(3) 


This dual problem allows us to obtain certificates of optimality: given a pair (x, y) £ [0,1]" x K, the quantity 

n 

gap(ar, y) := f(x) - w T x - ^ min{yi - 0} 

i=l 

is always non-negative, and equal to zero if and only if x is optimal for ([0 and y is optimal for the dual 
problem Eq. 0 - This duality relation is essential to certify that a given solution is optimal (and corresponds 
to the traditional min-cut/max-flow duality). 

While the cut problem is now reformulated as a convex optimization problem, it is still hard to minimize 
because neither the primal nor the dual are smooth, and thus iterative methods are typically slow (see de¬ 
tailed comparisons in |[23l ). We now reformulate the problem so that the dual problem becomes smooth and 
potentially easier to optimize. 


2.3 Equivalence to total-variation denoising 

Following 06] El [23] ED, we consider the total variation denoising problem: 

min xGR n f(x) + ^||a:-u;|| 2 . (TV) 

By expanding 3||x — w|| 2 into 4||x|| 2 — w T x + 3||w|| 2 , we see that going from the continuous problem 
( P to {TV} means replacing the constraint x £ [0, l] n by the penalty 5 ||x|| 2 = ^ ]C"=i x i- This has a 
number of important consequences: (1) It makes the optimization problem strongly convex and thus the dual 
problem will be smooth; see Eq. 0. (2) A solution to © and hence ([0 may be obtained by thresholding 
the unique solution x of ( |TV| ) at zero, that is, by defining Xi = 1 if Xi > 0 and Xi = 0 otherwise. This is 
usually not true for arbitrary convex functions / (even absolutely homogeneous) and is a direct consequence 
of submodularity. 

Importantly, we need not solve the TV problem ( |TV| ) exactly to obtain a solution to ([C]), we only need to 
know which of the components are positive (resp. negative). 

Analogously to Eq. 0, we may derive the dual to ( [TV] ): 

min/(x) + ^||«;-x|| 2 = min maxima; + h\w - x\\ 2 

= max min y T x H— ||tu — xll 2 
y£K xe K" 2" 11 

= max^lMI 2 ^ \\\y~ HI 2 - ( 4 ) 

y£K Z Z 


4 



Figure 1: Grid graph structures and decompositions, indicated by different colors. Left: 2D, 4-connected, 
middle: 2D, 8-connected, right: 3D tensor, 6-connected. 


The primal and dual solutions x and y have a simple correspondence x = w — y, as can be seen from the 
dual derivation. We have now obtained a dual problem which may be simply interpreted as the orthogonal 
projection of the vector w onto the poly tope K. Importantly, a slight additional argumentation shows that the 
discrete energy minimization problem (D), the total variation relaxation (TV), and the projection onto K are 
equivalent. That means, a fast subroutine for one of the problems implies a fast subroutine for the other two. 

While the reformulation is intuitive, orthogonal projections onto the polytope K are not fast to compute in 
general. Many special cases, however, are fast, including graphs that have fast cut subroutines (which may 
be combinatorial). 


3 Decomposition of graphs 


We assume that the total variation / (and equivalently the energy function) may be decomposed into a sum 
]P' _i fj(x) of r total variation functions fj . Each function fj can be defined on 2D grids, trees or chains. 
The only criterion for the decomposition is to efficiently perform orthogonal projections onto the polytope 
Kj corresponding to f 7 . In fact, /, does not need to have full support. 


While other decompositions are possible 


, in this paper, we focus on chains to illustrate our ideas; we 


note that ID-TV (on chains) can be solved very efficiently-see Section 3.2 Note that the same idea works 
for decompositions into trees, stars, small “cubes” and 2D sheets. 


3.1 2D and 3D grids 

The most frequenly used 2D and 3D topologies in computer vision decompose straightforwardly in several 
ways ll42l f32l 1301L391. In our experiments, we split the 2D 4-connected grid into vertical and horizontal lines 
(Figure |T|left), and the 2D 8-connected grid additionally into zig-zagged paths (Figure [I] middle). Diagonal 
lines are also possible but less efficient with respect to memory access. 

Even though the set of ID lines of equal color in the figures may be considered one fj, the lines are inde¬ 
pendent and may be addressed in parallel. Further parallelization (across colors) results from our splitting 
formulation. Finally, any fast algorithm for solving a 2D or 3D TV (or graph cut) problem may be used as a 
subroutine when setting fj to be an entire 2D grid sheet or 3D tensor. 
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3.2 Efficient ID TV 


As chains are the building blocks of our decomposition, performance of the overall method is heavily influ¬ 
enced by the speed at which ID-TV can be solved. Being a classic regularizer for image denoising, literature 
on solvers for different variations of ID and 2D TV abounds, though only recently fast direct methods for 
chains have been proposed. A notable example is the dynamic programming method of Johnson 1241 . which 
guarantees linear complexity. Another outstanding method is that of Condat 133; it is based on a thorough 
analysis of the KKT conditions and manages to achieve faster running times in practice, despite a pathological 
quadratic cost worst-case. These TV solvers, however, only apply to chains with constant weights. 

To permit varying weights we use a recent method of a that obtains Condat’s method through a taut-string 
viewpoint llT8l . in a way that allows weights along the chain. Experiments 0 indicate that this method shares 
the same performance as the original procedure, therefore rapidly solving TV chains in linear time in practice. 

We also point out that the choice of the ID-TV solver is independent of the overlying topologies and opti¬ 
mizers. This allows us to localize complexity to highly tuned TV chain solvers for the architecture under use 
(multicore, GPUs, etc.), thus providing overall modularity and adaption to the underlying hardware. In this 
paper, we use a general implementation for CPUs. 


Message passing. Alternatively to the method that we have used, we may also use message passing tech¬ 
niques, which could be more efficient on certain architectures. These are directly adapted to solve the min-cut 
problem on a chain or a tree, not the total variation problem. However, it is known that by a sequence of at 
most n min-cut problems, one may obtain the exact TV solution l22l . 


3.3 Decomposed dual problems 

By their form (total variation or Lovasz extensions of submodular functions), the /, may be represented as a 
maximum of linear functions, that is, fj(x) = ma x Vje j( j yjx, for Kj a certain polytope, j £ {1,..., r}. 
This form as well as the decomposability of the total variation may be used to obtain a decomposed dual 
problem for the continuous Problem 0 - The dual splits in the same way as the primal, and admits par¬ 
allel optimization algorithms |42] |32l l30l [39l ■ It however has a non-smooth objective function that makes 
optimization harder. 

We hence next describe two dual problems for the (TV) problem l23l . 


First dual problem. We use a standard reformulation for dual decomposition: we introduce a variable 
x = (ati,..., x r ) £ R" x • • • x R" composed of r copies of the input variable x, with the constraint that 
Xi = x for each i £ {1,..., r} (see, e.g., j9j)- We then add Lagrange multipliers A; £ R n for each of these 
constraints. Writing f 3 as a maximum of linear functions introduces a dual variable vector y :j £ Kj. We 
collect those variables in a vector y = (j/i,..., y r ) £ K = K\ x • • ■ x K r . Overall, we obtain the following: 


min ^2fj(x) +-\\w-x\\ 2 
J=i 


max 

AGM nXr , y£K 


mm 

n , xGl 


, x J yj + aIHI 2 - wTx + 4 ini 2 + 


3 =1 


3= 1 


3=1 


Xj (x - Xj) 


= max 

AeL, y<EK 



3 = 1 
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Figure 2: Illustration of alternating projections (AP) and averaged alternating reflections (AAR) for Prob¬ 
lem ([5]), the problem of finding the closest points between a polytope K and a subspace L. The iterations 
start at the black point z°. AP alternatingly projects onto K and then L. AAR reflects at K and then at L; the 
next iterate z t+1 (magenta) is the midpoint between z f and /(l /i'k(^)- For AAR, the sequence z 1 diverges, 
but the projected shadow sequence of y* = IIk(V) converges (red). Here, AAR takes larger steps than AP 
and hence converges more quickly. 


= max 

AeL. yGK 



2 lly — A ll 2 ’ 


(5) 


E v 

^ A j = w. We are thus faced with the problem of 


finding the closest point between two convex sets, which we explore in Sections 4.2 and 4.3 


Note that if the function f :j only depends on a subset of variables of x, then we may restrict the corresponding 
variable A, to be zero on the complement of that subset in order to have faster convergence for the iterative 
methods presented below. 


Second dual problem. Given y € K in (|5j, the optimal A £ L may be obtained in closed form: 


W 1 v—’ 

= ~ + vo - - 2 ^ Vk - 

&=1 


This leads to another dual problem, where the variables A £ L are maximized out: 


max -llio 
yeK 2 11 


1II 


( 6 ) 

(7) 


The problem above has separable constraints y, £ Kj, j £ {1,..., r} and a smooth objective function. We 


will discuss optimization procedures in Section 4.1 


Special case r = 2. When the function / is split into two functions, then the problem in Eq. (jTji is equivalent 
to finding the distance between the convex set K\ and the set {w — 3/2 | Vi £ -K2 }, for which methods 
presented in Sections |4.2|and [43] may be used. 


4 Optimization for decomposable problems 

Next, we describe optimization procedures that exploit the decomposable structure of the dual problems ([5]) 
and ([7]), where in particular the Cartesian decomposition K = Ki x • • • x K r of the constraint sets plays an 
important role. In particular, we exploit that the projection onto K consists of r independent projections onto 
the sets K, . By the above derivations, each of those projections can be done quickly via a TV subroutine for 
each /j. 
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4.1 Cyclic projections 


The first method we consider for problem (0 is block coordinate descent (BCD), a classic method J7] that has 
recently witnessed a huge resurgence of interest in large-scale optimization If36l l6l. Since the cost function 
is a separable quadratic, BCD assumes a form that is more commonly known as cyclic projections (more 
precisely, this is so if we go through the constraint blocks in a cyclic order). Specifically, we update coordinate 
blocks i = 1,..., r as follows: 

Vi <- argmax ~\\z - w + ^ y, :|| 2 = U Ki (w 

zeKi " jjti &i 

where 11^ denotes orthogonal projection onto set K ,. This projection is solved by solving a (fast) TV 
problem with fl. Notice that the variable y.-, is overwritten after the update, so that when updating y l+ -[ , the 
latest values j/i,... ,y» are used in the projection. 

In addition to cyclic projections, one could solve the smooth dual 0 using a gradient-based method like 
FISTA 0. Such a method is also easy to implement because the Lipschitz constant of the gradient is easily 
seen to be r and the required projections decompose due to the structure of K. 


4.2 Alternating projections in product space 

The method of cyclic projections offers a practical choice. However, it is inherently serial. To solve the 
problem in parallel, the first dual formulation 0 turns out to be more suited (note that this provides a second 
source of parallelization, beyond the fact that each polytope Kj is itself a product of poly topes corresponding 
to individual lines). 

The key idea is to exploit the “product space” K i x • ■ ■ x K r . Since w is constant, as previously mentioned, 
0 is nothing but the problem of finding the closest point between two convex sets l23l . Applying BCD, 
except this time with just two coordinate blocks, we obtain the classic alternating projections (AP) (cast in a 
product space setting), which performs for k = 0, 1 ,..., the iteration: 

yfc+i ar g max _ ||y — A fe || 2 = n K (A fc ), 

y£K 2 

A fc+1 ^ argmax -U\ - y fc+1 || = H L (y fc+1 ). 
agl 2 

The key point here is that the projection Hk decomposes 

IIk(A) = (n^i (Ai),..., n/f r (A T .)), 

so that each of the coordinate blocks may be computed in parallel (our implementation exploits this fact), 
while the projection Hl is merely an averaging step detailed in 0. 


4.3 Alternating reflections in product space 

The recent work ll23l provided strong experimental evidence that for projection problems of the form 0, AP 
is often outperformed by a more refined method of 0, namely, averaged alternating reflections (AAR). Here, 
instead of alternating between the projection operations Hk and Hl, one uses reflection operators 

f? K := 2n K — I, i?L : = 2n L — I, (8) 
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while averaging them to ensure firm nonexpansivity, a property that greatly simplifies convergence analy¬ 
sis 0 . To apply the AAR method, one first introduces the auxiliary vector z, which represents y — A. Then, 
AAR takes the form 

z k+1 = ±(R L R K + I)z k . (9) 

However since usually K D L = 0, the sequence (z fc ) generated by (|9]) diverges to infinity! The remarkable 
fact is that from this diverging sequence, we can extract a solution by maintaining a “shadow sequence” 
y k = n K (z fc ). See Figured for an illustration, and Theorem 3.13 in |4j for a proof of convergence. 

4.4 Extensions 

Above, we outlined flexible, parallelizable convex optimization algorithms for energy minimization with 
pairwise submodular potentials. These algorithms straightforwardly generalize from binary labels to the 
multi-label case, to submodular higher-order potentials, and to related problems. The reasons are two-fold: 
(1) the above algorithms solve a minimum cut problem, and any methodological machinary that builds on 
graph cuts as a subroutine will work with the above algorithms too; (2) the decomposition theory and tightness 
of the relaxations hold generically for submodular functions, not only graph cuts. 

For multi-label energy minimization, one may use move-making algorithms nn that reduce the multi-label 
problem to a series of binary submodular energy minimization problems. The methods above solve those 
binary problems. For combinatorial algorithms, it has proved useful to reuse existing solutions and data 
structures ll28l . “Warm-starting” is possible for the convex case too: we simply use the ij, vectors of the 
previous problem to initialize the new problem. If the geometry of the polytopes I\ has not changed too 
much, this can save many iterations (see Figure |3]T>)). 

Second, the convex approach directly generalizes to submodular potentials that involve more than two nodes 
at a time (following li23l ); such potentials include |J_2 4] 26 27, 19']. Many of those potentials correspond 
to sufficiently simple submodular functions, often with small support, such that the relaxation (the equivalent 
to total variation for graph cuts) can be solved fast. Moreover, the same methods may even generalize to be 
used with roof duality l38l . 

Finally, since the above methods also solve the parametric version of the discrete problem (by thresholding the 
solution of Eq. ( |TV| i at different levels) as a byproduct, they are also applicable to the numerous applications 
of parametric graph cuts SUED. 


5 Implementation details 

The algorithms are inherently parallel by design as each projection/reflection onto a chain graph is indepen¬ 
dent of the other. Our implementation assumes decomposed functions and the decomposition depends on the 
problem at hand. In Section [3Tj we described some possible decompositions of grid-like graph structures on 
2D and 3D graphs. However, this extends to many other decompositions. Empirically and theoretically 041 . 
longer connected structures lead to faster convergence than decomposition e.g. into single edges. 


5.1 Parallelization 


We use the efficient ID-TV implementation of |[3] to solve the projection/reflection on chains in parallel. 
Our implementation is in C++ and uses OpenMP; it ensures that the memory access pattern across threads 
is streamlined, since bad memory access patterns can lead to considerable slowdowns. While the ID-TV 
solver is not optimized for GPUs, as explained in Section 3.2 it can be replaced by message passing based 
subroutines, which are inherently parallel and also friendly to GPU architectures. 
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Figure 3: Speedup of the AAR algorithm with parallelization and warm starts, (a) Normalized scale of 
performance with increasing number of cores on Bunny datasets m of different resolutions, (b) Number of 
iterations taken by each from of a video with ’’normal” initialization and ’’warm start” from the dual variables 
of the previous frame. 


5.2 Memory footprint 

In our implementation every decomposable function must maintain states of dual variables for each node in 
the graph. Thus, the memory requirement of our methods increases bilinearly in the number of decomposed 
functions and the number of nodes. Our experiments suggest that the projection-based algorithms require 
less memory than standard combinatorial algorithms—see Table [T] Unlike the 32 bit integers used in many 
other implementations, we use (64 bit) double precision numbers. Reducing those to 32 bit would reduce the 
memory requirements of the projection methods even further. 


Algorithm 

aar bk na 

IBFS iH 

hpf m 

Memory (GB) 

26.82 42.83 

44.16 

55.46 


Table 1: Memory footprint for Abdomen dataset (512 x 512 x 551) 


5.3 Running time 

With the TV subroutine we use, each projection/reflection step scales in the worst case quadratically in the 
length of the chain (the chain length is typically equal to yfn or <]n, where n is the total number of nodes in 
the 2D or 3D graphs), but is in practice usually linear @. In fact, it did not scale quadratically for any of our 
data. Hence, empirically, the cost of each iteration grows bilinearly with the number / of functions /), and 
with the number of nodes in the graph. 


6 Experiments 


Our experiments study the performance of the projection algorithms on 2D and 3D-maxflow datasets JT), 
exploiting in particular the parallel nature of the algorithms. We compare the algorithms to standard, popular 
maxflow implementations such as BK iflOl , IBFS fTTi . and HPF |fl3l . For other algorithms [42]|32j, we have 
not been able to find implementations that were easily portable to 3D datasets. 
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Table [2] shows running time and the number of iterations for the projection algorithms and others on a mul¬ 
ticore machin^] The timings are recorded using the gettimeofday command. All the timings are for the 
optimization phase only and exclude data I/O (which is common to all methods). 


6.1 2D problems 


As a 2D example, we use the tsukuba data m, a multi-label task on 2D images corresponding to 4-neighborhood 
grids. To cope with multiple labels, we use alpha expansion El. Notably, the decomposition for 2D 4- 
connected grids uses only two functions (vertical and horizontal), and therefore corresponds to the special 
dual for r = 2 in Section 3.3 For this formulation, AAR converges remarkably faster than other iterative 


algorithms, and even outperforms the combinatorial methods. The time comparisons at the bottom of Table[2] 
show that in general, the running times of our methods are comparable to standard combinatorial algorithms 
on images of 384 x 288 size. 


Warm starts. Figure [3] b) shows the number of iterations required for the algorithm to converge on each 
frame of size 480 x 360 of the VideoSegA (T) dataset. These are the consecutive frames of a video, which 
are 2D images with 8-neighborhood grids (see Figure [TJ. We use the dual variables at the convergence of 
the previous frame to warm-start the projection/reflection process. This makes the method converge to the 
optimal solution substantially quicker than with other initializations. 


6.2 3D problems 

On the 3D data, the running times of the algorithms differs more widely. In particular, the number of iterations 
for the algorithms to converge appears to depend on two important characteristics: (i) number of nodes in 
the graph (dimensionality) (ii) the edge weights in the graph (weights of the TV term). The latter affects the 
size of the polytopes K, i.e., the diameter of the domain of the dual problem, a parameter that commonly 
influences the convergence of convex optimization methods (see also El). 


Effect of edge weights. Table[2]shows results for larger edge weights, as well as weights scaled by a factor 
of 0.1. The iterative methods become faster with smaller weights, while the combinatorial methods robustly 
perform well with large weights too. On many instances, AAR converges faster than the cyclic BCD, while 
on others BCD is faster. 


Approximate solutions. Since we can obtain a feasible solution (discrete cut) from any iterate of the pro¬ 
jection methods by thresholding the continuous vector. Table [2] also shows the time taken to obtain an ap¬ 
proximate solution with limited error (10% and 2%, measured by Jaccard distance). The results suggest that, 
while a complete dual certificate of convergence for the discrete problem takes a bit longer, a reasonable 
approximate solution can be obtained fairly quickly. 


Parallel speedup. Figure[3]-(a) shows the speedup of AAR achievable with an increasing number of cores. 
This figure reports the running time on the Bunny dataset (3D) with different resolutions: Large (401 x 396 x 
312), Medium (202 x 199 x 157), and Small (102 x 100 x 79). It is evident that more cores can improve the 
performance of the algorithm considerably. When using GPUs, it is important to consider their limited video 
memory, and hence the algorithms need to have a low memory footprint to perform well. 

*20 core. Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz with lOOGigabytes of memory. We only use up to 16 cores of the machine 
to ensure accurate timings. 
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Time in seconds Iterations 
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taken for the algorithm to find a cut whose difference to the optimal cut is p% of the difference between the cut in the first iteration and the optimal cut. 
AAR-JD(< p ) denotes time taken by the algorithm to reduce Jaccard Distance to p. AAR(O.lx) is the number of iterations taken by AAR after scaling the 
pairwise weights by 0.1. 















































Memory. Apart from running time, we also investigate the memory footprint of the algorithms. Table [T] 
shows the memory footprint of all algorithms on the Abdomen data CD , which is 512 x 512 x 551. AAR uses 
considerably less memory than the standard algorithms. 


7 Conclusion and Future work 

We have proposed parallel iterative algorithms for binary energy minimization problems. The algorithms rely 
on a fast projection subroutine. For binary submodular potentials (graph cuts), this subroutine is simply a total 
variation problem, which can be efficiently solved on sub-graphs with special structure. In other examples, 
these subroutines could be fast algorithms for solving cuts on arbitrary subgraphs, or for simpler submodular 
energies. Hence, while the experiments here concentrate on cuts and decompositions into line graphs, the 
same methods apply to decompositions into 2D sheets, 3D cubes or any other subgraphs, and to sums of 
simple higher-order potentials. 

We observed that the iterative methods perform similarly to combinatorial methods on 2D grid graphs, and 
require less memory than other, popular implementations of maximum flow algorithms. The tradeoffs be¬ 
tween! convex and combinatorial methods illustrated here have some interesting implications, and suggest a 
wider study of integrating combinatorial and convex methods via different decompositions. For example, in¬ 
stead of TV oracles for line graphs, one may use oracles for larger specialized subgraphs. These oracles could 
use algorithms such as BK, HPF or IBFS, since the projection and TV oracle can be solved by parametric 
graph cuts. A 3D tensor is easily decomposed into two components: grids and lines. Those more complex 
subroutines can still be invoked in parallel using AAR. Thus, one can combine convex and combinatorial 
methods to greatly benefit from the strengths of both. 

The convex algorithms admit stochastic variants too m However, the decompositions used in the experi¬ 
ments here (Figure [TJ only use a decomposition into 2-4 functions. Finally, given the remarkably improved 
behavior of iterative methods for smaller weights, it is of great empirical interest to study algorithms that 
can use homotopies or continuation techniques to start by solving with lower weights and use the ensuing 
solutions to speed up the medium and high weight regimes. 
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